Bayes Optimal Hyperplanes! Maximal Margin Hyperplanes
نویسندگان
چکیده
Maximal margin classifiers are a core technology in modern machine learning. They have strong theoretical justifications and have shown empirical successes. We provide an alternative justification for maximal margin hyperplane classifiers by relating them to Bayes optimal classifiers that use Parzen windows estimations with Gaussian kernels. For any value of the smoothing parameter (the width of the Gaussian kernels), the Bayes optimal classifier defines a density over the space of instances. We define the Bayes optimal hyperplane to be the hyperplane decision boundary that gives lowest probability of classification error relative to this density. We show that, for linearly separable data, as we reduce the smoothing parameter to zero, the Bayes optimal hyperplane converges to the maximal margin hyperplane. We also analyze the behavior of the Bayes optimal hyperplane for linearly non-separable data, showing that it satisfies a very natural optima. We explore the idea of using the hyperplane that is optimal relative to a density with some small non-zero kernel width and compare the resulting hyperplane with the maximal margin and the soft margin hyperplanes.
منابع مشابه
Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space
Bayesian algorithms for Neural Networks are known to produce classiiers which are very resistant to overrtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a...
متن کاملLarge Margin DAGs for Multiclass Classification
We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an -class problem, the DDAG contains classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on and on the margin ...
متن کاملBayesian Classiiers Are Large Margin Hyperplanes in a Hilbert Space
It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms for neural networks is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classiiers, whose coeecients are given by Bayes theorem. This can be ...
متن کاملMedian and center hyperplanes in Minkowski spaces--a unified approach
In this paper we will extend two known location problems from Euclidean n-space to all n-dimensional normed spaces, n¿ 2. Let X be a "nite set of weighted points whose a2ne hull is n-dimensional. Our "rst objective is to "nd a hyperplane minimizing (among all hyperplanes) the sum of weighted distances with respect to X. Such a hyperplane is called a median hyperplane with respect to X, and we w...
متن کاملfinding the defining hyperplanes of production possibility set with variable returns to scale using the linear independent vectors
The Production Possibility Set (PPS) is defined as the set of all inputs and outputs of a system in which inputs can produce outputs. In Data Envelopment Analysis (DEA), it is highly important to identify the defining hyperplanes and especially the strong defining hyperplanes of the empirical PPS. Although DEA models can determine the efficiency of a Decision Making Unit (DMU), but they...
متن کامل